In this folder are functions to divide the data set into training and validation sets for k-fold cross-validation (default k is 10).

Data points should already be structured in a matrix X where each row is a separate data point. Also, a column vector y containing the category (or class label, for eg. plus10) for each of the corresponding data points. Use functions in INPUT folder for inserting and parsing the data appropriately.

Also, defined is a column vector ‘categories’ which just lists the class label values being used. The code doesn’t make any assumptions about the values being used for class labels. The default category variable contains [0, 1] as the two categories. 

getVecsPerCat.m - Counts the number of vectors belonging to each category.

computeFoldSizes.m - Pre-compute the size of each of the k folds for each category. The number of folds might not divide evenly into the number of vectors, so this function handles distributing the remainder across the folds.

randSortAndGroup.m - Sorts the vectors by category, and randomizes the order of the vectors within each category.

getFoldVectors.m - For the specified round of cross-validation, selects X_train, y_train (the vectors to use for training, with their associated categories) and X_val, y_val (the vectors to use for validation, with their associated categories).

mainCrossVal.m - This is the main function integrating calls to above functions for k-fold cross-validation data set creation. The actual training (using functions from the SETUPML folder) need to be performed, and validation accuracy on the validation vectors using predictML from the SETUPML folder needs to be executed. 